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1. Introduction 

1.1 Abstract 

With the release of HP-UX 10.20, Hewlett-Packard introduces support for files over 2 GB’s in size on 32-bit 
machines. In past releases the maximum size for a file has been 2 GB’s. This implementation adheres to 
the specifications outlined by the Large File Summit, which is a group of OS vendors and application devel¬ 
opers. Large files are supported in many areas of the HP-UX operating system, including standard library 
calls, the kernel filesystem interface, and many appropriate commands. 

HP-UX 11.0 is the first HP-UX release to be delivered as a 64-bit OS. The information in this document 
was originally designed for using 64-bit values on a 32-bit OS. The items discussed in this paper still apply 
to the 32-bit OS version of HP-UX 11.0, and the 32-bit applications (including the ones supporting large 
files) on a 64-bit OS. Many of the items will also apply to 64-bit applications on a 64-bit OS, however, where 
differences occur, these will be noted. 

1.2 Purpose of Document 

The purpose of this document is to communicate the availability of large files, the implications of enabling 
large files on your system, and to give developers guidance on how to account for and take advantage of 
large files. 

1.3 Intended Audience 

This document is intended for system administrators of HP-UX systems that need to provide large files on 
their systems, as well as developers of applications and libraries that will either take advantage of large 
files or will need to co-exist with large files. 

1.4 Overview of Document 

’’Section 3. Enabling Large Files” and “Section 5. Commands Support of Large-Files” will be of particular 
interest for system administrators as they describe how to enable large files on your system and what com¬ 
mands will work with large files. “Section 6. Application Development” is required reading for any developer 
that will have an application or library on a large file enabled system. 

1.5 Related Documentation 

The following list of references was used in the preparation of this white paper. The reader is urged to con¬ 
sult them for more information. All of these documents are available via http at: 

http ://www.sas.com/standards/large.f i le/i ndex. htm I 

• [1] “Overview of the Large File Support effort” 

• [2] “Draft Specifications for Large File Support” 

• [3] “HP’s Large Files and the Large File Summit December, 1995” - (See Appendix 1) 
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2. Overview 

This chapter gives a iist of product features and covers severai key points for appiication deveiopers. This 

chapter does not describe impacts to appiications, these are covered iater in the document. 

2.1 Product Features 

inciuded beiow is a iist of the features provided by iarge fiies for the HP-UX: 

• Support fiie sizes greater than 2 GB on HP-UX. 

• Remain binary compatibie. No change required to existing appiications as iong as they do not 
access iarge fiies. 

• Provide new data types, data structures, and macros to support iarge fiies. 

• in addition to existing POSiX APi, support new APi for HP customers. 

• Support APi error handiing: an error is returned whenever an APi cannot return the correct resuit 
of an operation. 

• Provides a new compiie option, _file_offset_bits, that can be used to set the compiie envi¬ 
ronment to 64-bits or 32-bits. (See Section 6.1.2 Compiie Environments) 

• Provides two new interfaces, fteiio and f seeko, that are made avaiiabie by the 
_LARGEFiLE_S 0 URCE compiie option. (See Section 6.1.1.3 New Non-POSiX APi) 

• Provide one additionai compiie option _largefile64_source to provide the new APi. (See 
Section 6.1.1.4 New 64-bit Specific interfaces and Data types) 

• Provides two announcement macros, _lfs_largefile and _lfs_largefile64, to determine 
if _LARGEFiLE_souRCE and _LARGEFiLE64_souRCE are Supported. (See Section 6.1.4 Deter¬ 
mining if iarge-fiies are supported) 

• Supported by appropriate HP-UX commands. 

• Support POSiX compiiance. 


2.2 Product Limitation 

The HP-UX 10.20 reiease supports a maximum fiie size of 128GB’s. it is expected that the maximum fiie 
size wiii increase in future reieases of HP-UX. Large fiies are not supported by the NFS or DPS fiiesystems 
in HP-UX 10.20. Large Fiie support has been added for NFS in HP-UX 11.0. 

2.3 Appiication interfaces to Large Fiies 

Some steps have to be done before an appiication can access a iarge fiie. 

• The appiication source code needs to be modified properiy. 

• The appiication needs to be complied with different compiie options to access iarge fiies. 

For appiications that wish to support iarge fiies, two interface mechanisms are provided. First, a set of new 
system caiis (new APi) are avaiiabie for appiications. Second, a 64-bit compiie environment 
(_FiLE_OFFSET_BiTS=64) is provided for appiication development, which modifies the size of various 
data types and structures in the existing POSIX API to support large files. 
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New 64-bit API 

HP-UX adds a set of 64-bit data types, system calls, and libc routines in the form of interface6A() to sup¬ 
port large files. This new API is like the existing POSIX API, except that all applicable file sizes, offsets, and 
block counts are 64 bits. It also contains a new version of open () (open 64 ()) that succeeds on files of 
any size. The new API was defined by a group of OS and application suppliers. 

An application using the new API will be compiled in the 32-bit compile environment. This means compiler 
views data types such as of f_t as 32-bit and it is left to the application to pass around 64-bit data types 
within the program properly to access large files. 

In order to avoid name space pollution in user space, the _largefile64_source compile option must 
be specified to use the new API. Please note that the _largefile64_source compile option will include 
the interfaces provided by the _largefile_source compile option. 

The new API is intended for C programmers; if you use another language, please refer to the man pages 
and/or compiler documentation. 

64-bit compile environment 

Note: 64-bit compile environment refers to compiling a 32-bit application with 64-bit extensions; it does not 
refer to a 64-bit application that will only run on a 64-bit OS. 

An application can be compiled in a 64-bit compile environment by setting the _file_offset_bits com¬ 
pile option to 64 (_FiLE_OFFSET_BiTS=64 ). This mechanism automatically changes the size of various 
data types and structures from 32 bits to 64 bits. For example, of f_t, normally 32 bits, is 64 bits in this 
environment, and iseek in this environment expects a 64-bit offset of type of f_t. 

For HP-UX 11 .0,_FILE_OFFSET_BITS is Set to 64 by default when compiling a 64-bit application (using 

+DDLP64). 

Complete details of the application conversion guideline can be found in “Section 6. Application Develop¬ 
ment.” 

New non-POSIX API 

HP-UX adds two new libc routines, f seeko { ) and fteiio () , as a part of the large files implementation 
because fteii ( ) and f seek () can not be extended to support large files. These interfaces are provided 
in the 32-bit and 64-bit compile environment by invoking the _largefile_source compile option. Please 
note that the _largefile64_source compile option includes the interfaces provided by the 
_largefile_source compile option. More detailed information about these interfaces is available in 
“Section 6. Application Development“. 

2.4 Protecting existing appiications through fiiesystem administration 

One feature of large files is to provide safe areas for existing applications that may not correctly behave if 
they encounter a large file. Various commands allow the system administrator to control whether or not a 
large file may be created on a file system. 

File systems may be marked and mounted, specifying whether or not they are to contain large files. The 
f sadm command allows the system administrator to selectively enable and disable the creation of large 
files on individual file systems, mkf s may also be used to enable large files when creating a file system. 
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Once a file system has been marked to enable large files, using the above commands, mount is then used 
to mount the file system. All of these commands accept the largef lies and noiargef lies options. 

Any attempt to create a large file on a file system that has not been appropriately enabled with either 
f sadm or mkf s will result in the same error as previous releases of HP-UX (prior to 10.20). This option 
protects existing applications and allows them to continue working without any source changes. 

Complete details of the administration commands implementation may be found in “Section 3. Enabling 
Large Files". 
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3. Enabling Large Files 

3.1 Default on system is small files 

Large files is a technology that must be explicitly enabled. A system will not support large files just because 
it has been updated to a release of HP-UX that supports large files. The advantage of this is that if large 
files are not needed, then they do not need to be enabled on the system and everything will continue to 
work as it has in the past. 

3.2 Creating a large-files filesystem 

Creating a large files filesystem can be done with the mkf s command or the newf s command. The newf s 
command is a friendly interface to the mkf s command, and they both use the same options to create 
large-files and no-large-files filesystems. As of this release, the default behavior of these commands is to 
create a no-large-files filesystem. However, this default could be changed in future release of HP-UX. 
Therefore, it is a good idea to explicitly set either the large files or no large files option. 

The following examples show how to create a large-files filesystem. 

• /usr/sbin/mkfs -F hfs -o largefiles /dev/vg02/rlvoll 

• /usr/sbin/newfs -F hfs -o largefiles /dev/vg02/rlvoll 

• /usr/sbin/mkfs -F vxfs -o largefiles /dev/vg02/rlvoll 

• /usr/sbin/newfs -F vxfs -o largefiles /dev/vg02/rlvoll 

The following examples show how to create a filesystem that will not support large files. 

• /usr/sbin/mkfs -F hfs -o nolargefiles /dev/vg02/rlvoll 

• /usr/sbin/newfs -F hfs -o nolargefiles /dev/vg02/rlvoll 

• /usr/sbin/mkfs -F vxfs -o nolargefiles /dev/vg02/rlvoll 

• /usr/sbin/newfs -F vxfs -o nolargefiles /dev/vg02/rlvoll 

3.3 Changing a file system from one to the other 

HP-UX also provides the ability to change a filesystem back and forth between large files and no large 
files. This is provided by the f sadm command which also provides other filesystem administration capabil¬ 
ities. It is important to realize that the conversion of these filesystems must be done on an unmounted file¬ 
system, and f sck will be called after a successful conversion. 

The following example shows how to convert a no-large-files filesystem to a large-files filesystem. 

• /usr/sbin/fsadm -F hfs -o largefiles /dev/vg02/rlvoll 

While the conversion of a no-large-files filesystem to a large-files filesystem should always succeed, the 
same is not true for converting a large-files filesystem to a no-large-files filesystem. The latter will only suc¬ 
ceed if there are no large files on the filesystem. If even one large file is detected on the filesystem being 
converted, then the f sadm command will not convert the filesystem. Therefore, if it is necessary to convert 
a large-files filesystem that actually has large files on it to a no-large-files filesystem, the large files must be 
removed before conversion. 
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The following example shows how to convert a large-files filesystem to a no-large-files filesystem. 

• /usr/sbin/fsadm -F hfs -o nolargefiles /dev/vg02/rlvoll 

3.4 Mount Protection 

The mount command has been modified to support large-files filesystems and provides the system admin¬ 
istrator with a method of ensuring that no large-files filesystems are mounted on the system. The mount 
command uses the same two options as the mkfs, newts, and tsadm commands (largefiies and 
nolargefiles). mount will not mount a large-files filesystem if the -o nolargefiles option is speci¬ 
fied. Conversely, the mount command will not mount a no-large-files filesystem if the -o largefiies 
option is specified. If no option is provided to mount, it will use the state of the filesystem itself to deter¬ 
mine if it is mounted as largefiies or nolargefiles. The following table summarizes the expected mount 
results: 


Table 1: Mount Results 


Mount Command 

Filesystem 

type 

Result 

/usr/sbin/mount -F hfs -o largefiies /dev/vg01/lvol2 /tmp/test 

no-large-files 

Fails 

/usr/sbin/mount -F hfs -o nolargefiles /dev/vg01/lvol2 /tmp/test 

no-large-files 

Pass 

/usr/sbin/mount -F hfs -o largefiies /dev/vg01/lvol2 /tmp/test 

large-files 

Pass 

/usr/sbin/mount -F hfs -o nolargefiles /dev/vg01/lvol2 /tmp/test 

large-files 

Fails 

/usr/sbin/mount -F hfs /dev/vg01/lvol2 /tmp/test 

no-large-files 

Pass: no-large-files 

/usr/sbin/mount -F hfs /dev/vg01/lvol2 /tmp/test 

large-files 

Pass: large-files 
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4. Backing Up Large Files 

Providing the abiiity to create and modify iarge fiies is not enough. HP-UX must provide some basic abiiity 
to backup and recover fiies greater than 2GB. There are a number of toois that perform this task. 

4.1 Backup utilities that support large files 

The foiiowing backup utiiities wiii back up iarge fiies: 

1. dd 

2. fbackup/frecover 

Both of these commands require no user intervention to back up iarge fiies. If a backup contains iarge fiies 
and an attempt is made to restore the fiies on a fiiesystem that does not support iarge fiies, the iarge fiies 
wiii be skipped, fbackup and f recover wiii use a new format for this reiease so a backup tape created 
on HP-UX 10.20 or iater can not be restored on a reiease of HP-UX prior to 10.20. 

4.2 Backup utilities that do not support large files 

4.2.1 tar, cpio, pax, ftio 

Some of the backup commands, specificaiiy tar, cpio, pax (tar & cpio formats), and ftio 
(because it creates cpio format archives) are restricted from supporting iarge fiies due to standards 
defined headers in the archives. Aithough the headers aiiow archivai of fiies up to 8GB, there is no guaran¬ 
tee that there wiii be no attempt to restore these fiies on a system that does not support iarge fiies. These 
commands wiii therefore support fiies up to 2GB oniy. Attempts to archive any fiies >2GB wiii faii, and the 
fiies wiii not be added to the archive. 

4.2.2 dump, restore 

dump/restore are iarge fiie aware. If dump finds the iargefiie feature bit ‘on’ in the super biock, it wiii 
issue an appropriate error message and terminate. This wiii be true even if the fiie system does not contain 
any iarge fiies. If the Iargefiie (and largeuid) featurebits are not on, dump will function as normal, and 
restore will extract from the dump archives. 
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5. Commands Support of Large-Files 

This section of the specification examines the core HP-UX commands support for iarge fiies. First, we dis¬ 
cuss the rationaie and assumptions behind the decisions, then iist the commands that wiii support iarge 
fiies. Finaiiy, we discuss some key impiementation detaiis associated with the commands. 

5.1 Rationale 

For the near future, we expect that iarge fiies wiii be an exception, rather than commonpiace. Therefore, it 
is not necessary to change the entire set of commands to support iarge fiies. Rather, we shouid consider 
modifying a weii thought-out subset of the commands. To heip reduce the number of commands that must 
support iarge fiies, we can make the foiiowing assumptions about the use of iarge fiies: 

1. Executabies, iog fiies, pattern fiies and sheii scripts wiii continue to be smaii. We aiso do not 
expect that binary programs wiii become greater than 2GB any time soon. Log fiies of this size are 
usuaiiy too iarge to manage and shouid be trimmed. 

2. interactive editing wiii continue to use smaii fiies. The operating system may begin to experience 
obvious performance probiems if a user vi’s a iarge fiie. 

3. Printing fiies wiii continue to be iimited to smaii fiies. Aithough the need has not yet surfaced, print¬ 
ing iarge graphic fiies may be required sometime in the future. 

4. Maii wiii continue to support smaii fiies oniy. Maii must be packaged for receipt by any system and 
there is no way to guarantee a target system’s maii handier supports iarge fiies. in addition, current 
protocois do not support muiti-megabyte fiies very weii. 

5. Toois that package fiies for exchange between muiti-vendor UNiX systems wiii continue to operate 
on smaii fiies oniy. Commands such as shar cannot guarantee that the target system wouid be 
abie to unpack the archive. However, these types of toois may need to support iarge fiies soon 
after the feature is avaiiabie on many other vendors’ systems. 

To further heip us seiect the commands that support iarge fiies, we aiso consider future standards work. 
Some of the commands, such as tar, are not extensibie under current standards to support 128GB fiies. 
in such cases, we have chosen not to deviate from the standards. Other commands, such as pack/ 
unpack, are marked to be withdrawn from the standards either because they are dupiicates of other com¬ 
mands that are considered ‘more standard’ or no ionger have any reievant use. These type of commands 
have not been enhanced to support iarge fiies. 
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5.2 Commands Supporting Large Files 

5.2.1 File System Administration Commands 

All of the filesystem administration commands for HFS and JFS support large files. The complete list Is as 
follows: 


bdf 

fsclean 

mvdir 

chroot 

f sdb 

ncheck 

clri 

f styp 

newf s 

convertfs 

fuser 

rmboot 

devnm 

getext 

setext 

diskinfo 

labelit 

sync 

disksecn 

link 

syncer 

dump f s 

mkboot 

tunefs 

extendfs 

mkfs 

umount 

ff 

mk 1 o St -L found 

umountall 

f sadm 

mount 

unlink 

f scat 

mountall 

volcopy 

fsck 


vxdiskusg 


5.2.1.1 mkfs 

See “Section 3. Enabling Large Files” 

5.2.1.2 newts 

See “Section 3. Enabling Large Flles“ 

5.2.1.3 mount 

■See “Section 3. Enabling Large Flles“ 

5.2.1.4 fsck 

The fsck (IM) command repairs damaged large-files filesystem. The external Interface of the command 
Is the same, however the brief discussion below describes the changed behavior of the command. 

The primary superblock, of the filesystem. Is considered to be accurate, after the fsck (1M) command 
verifies It with a secondary superblock. In case the two superblocks do not match, the fsck ( im) com¬ 
mand does not proceed, with filesystem consistency checks, until the user specifies an alternate primary 
superblock that Is Identical to the secondary superblock. This Is an existing behavior. 

Typically, large files should not manifest In a no-large-flles filesystem. However, fsck (1M) must recover 
from this situation. The first scenario uses the Interactive mode, f sck ( im) finds a large file on a no-large- 
flles filesystem, marks the filesystem dirty and stops. The system administrator then corrects the situation 
using the f sadm (IM) command to turn on the large-files featurebit. The filesystem would then be 
repaired and available for mounting. This would preserve the large file. If fsck did not find It corrupt In any 
other way. 
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In non-interactive mode, the iarge fiie on a no-iarge-fiies fiiesystem wouid be purged, fsck assumes the 
superbiock to be accurate based on its accuracy checks. The probabiiity of a superbiock being corrupt is 
insignificant when compared to the instance of a iarge fiie manifesting in a no-iarge-fiies fiiesystem. Conse- 
quentiy, fsck wiii remove the iarge fiie from a fiiesystem it beiieves shouid not contain iarge fiies. 

5.2.1.5 fsadm 

See “Section 3. Enabiing Large Fiies" 

5.2.2 Backup Commands 

See “Section 4. Backing Up Large Fiies" 

5.2.3 Accounting and Quotas 

Aii of the accounting and quota commands fuiiy support iarge fiies through their scanning and reporting 
mechanisms. They have the abiiity to recognize iarge fiies and report their usages. However, the reporting 
mechanisms of these commands wiii continue to create oniy smaii fiies (as stated in the Rationaie section 
of this chapter). 

5.2.4 File System User Commands 

Aii of the fiiesystem user commands support iarge fiies. A compiete iist is as foiiows: 


basename 

du 

pathchk 

chgrp 

file 

prealloc 

chmod 

find 

pwd 

chown 

getaccess 

rm 

cp 

In 

rmdir 

df 

is 

touch 

dirname 

mkdir 

whereis 


mv 

which 


5.2.5 Shells 

Aii three of the HP-UX sheiis, inciuding POSIX sheii, Kornsheii, and C-sheii, support iarge fiies through 
redirection. The defauit of each sheii is to use a iarge open for redirection. If the ‘open’ occurs on a filesys¬ 
tem that is enabled for large files, a large file may be created. If not, then the file is limited to 2GB. There is 
no switch (environment variable) that allows the user to toggle between large and small file opens for redi¬ 
rection. This will guarantee that all large file applications will always get the appropriate file descriptor and 
will not fail because of an environment switch the user may have failed to set. The only area that a user 
may get into trouble is when a non-largefile application is running in the large file mounted filesystem and 
encounters a file greater than 2GB through a pipe or redirection. This is considered a very slight risk since 
most applications of this type do not seek through the data, but rather process it sequentially. 
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5.2.6 Text Processing Commands 

Below is a list of handy commands that may be used to process files that are large. Each will appropriately 
handle large data files. However, pattern files for commands such as awk and sed will continue to remain 
small. There are no new options in this set of commands. 


awk 

expand 

shar 

bdif f 

fgrep 

sort 

cmp 

fold 

split 

comm 

grep 

strings 

csplit 

head 

sum 

cut 

hyphen 

tail 

egrep 

join 

tr 


paste 

unexpand 


sed 

uniq 


5.2.7 Misc. Commands 

Below is the list of various commands that do not fall into any particular category, but may be useful with 
large files. They support large files as well: 


cat 

cksum 

compress 

mktemp 


page 

od 

tee 

uncompress 


uudecode 

uuencode 

wc 

xd 

zcat 


5.3 Compatibility 

The large file support in the commands does not affect the operation of existing command options. We are 
not obsoleting or changing any of the command interfaces, only introducing new features and a few new 
command-line options. There is no impact on binary compatibility. 

All of the commands use, in some way, the new interfaces that enable them to operate on large files. This 
interface exists only on HP-UX systems from HP-UX 10.20 forward. Moving any of these commands that 
have been enabled for large files backwards to execute on a pre-10.20 system will fail. 

5.4 Behavior of commands that do not support Large Files 

If commands that do not support large files are run against a large file, the command will return an 
[EOVERFLOW] error and print a message similar to the following: 

Value too large to be stored in data type 
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6. Application Development 

This section discusses key concepts of iarge-fiies, conversion choices when deaiing with iarge-fiies, con¬ 
version exampies, common troubie areas when converting to iarge-fiies, and heipfui debugging tech¬ 
niques. This section is important for writing appiications to support iarge fiies. 

6.1 Key Concepts 

The size of fiies for reieases prior to HP-UX 10.20 has been iimited to 2GB’s. This is a direct resuit of using 
32-bit data types for fiie sizes and offsets, in order to remove this boundary it is necessary to eniarge these 
types to 64-bit data types. Whiie this can be done for abstract data types, i.e.: of f_t, it can not be done for 
integrai data types such as a long as this wiii not be changed untii HP-UX is moved from a 32-bit OS to a 
64-bit OS, which has occurred for HP-UX 11.0. The resuit of this is that two interfaces, f seek () and 
fteii (), are not extensibie to iarge fiies on a 32-bit OS because of their POSiX definition, in order to 
address aii of these issues and provide iarge-fiies on a 32-bit OS, severai new compiie options have been 
provided to determine what compiie environment to use and what interfaces to make avaiiabie. 

in order to make an existing appiication iarge-fiies aware, it is imperative to understand the two new com¬ 
piie environments, how iarge-fiies has been impiemented, and the importance of using the correct header 
fiies.it is aiso important for the deveioper to reaiize that enabiing an appiication for iarge fiies wiii require 
some work. 


6.1.1 Affected data types, structures, and API’s 

The foiiowing tabie summarizes the data types and data structures that now have a 32-bit and 64-bit ver¬ 
sion to support iarge-fiies. The structures beiow have eiements that refer to the new data types. Therefore, 
these structures wiii be abie to support smaii fiies in the 32-bit compiie environment and to support iarge 
fiies in the 64-bit compiie environment. The iist of data structures is derived from the iist of data types. 
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6.1.1.1 Data Types And Structures 


data types 

blk:cnt_t 

fsblkcnt_t (unsigned) 
fsfilcnt_t (unsigned) 
fpos_t (signed) 
off_t (signed) 
rlim_t (signed) 

structures 

flock 

off_t l_start 
off_t l_len 


rlimit 

rlim_t rlim_cur 
rlim_t rlim_max 


stat 

off_t st_size 
blkcnt_t st_blocks 


statvfs 

fsblkcnt_t f_blocks 
fsblkcnt_t f_bfree 
fsblkcnt_t f_bavail 
fsfilcnt_t f_files 
fsfilcnt_t f_ffree 
fsfilcnt_t f_favail 


async_request 

off_t offset 


6.1.1.2 POSIXAPI 

The following table lists the POSIX API that now have a 32-bit and 64-bit version to support large-files. 
These API are not broken by definition since they utilize abstract data types to represent file sizes or off¬ 
sets instead of integral data types. 


Table 2: POSIX API 


creat() 

fgetposO 

fopen() 

freopenO 

fsetposO 

fstatO 

fstatvfsO 

fstatvfsdevO 

ftruncateO 

ftw() 

lockfO 

lseek() 

IstatO 

mmapO 

nftw() 

open() 

pre alloc 0 

stat() 

statvfsO 

statvfsdevO 

tmpfileO 

truncateO 

getrlimitO 

setrlimitO 



POSIX API that can not be extended for large-files 

The fseeko and fteiio interfaces can not be extended to support large files on a 32-bit OS because 
the POSIX standard defines the use of long’s to represent an offset value. 
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long int ftell (FILE *stream); 

int fseek (FILE *stream, long int offset, int whence); 

The use of long’s by the standard limits these values to 32-bits on a 32-bit OS. 

6.1.1.3 New Non-POSIX API 

The following new interfaces have been provided to allow the same functionality as fseek () and 
ftell (), but use of f_t’s instead of long’s. These interfaces are provided by both 

_LARGEFILE_SOURCE and _LARGEFILE64_SOURCE. 

tinclude <stdio.h> 

off_t ftello (FILE *stream); 

int fseeko(FILE *stream, off_t offset, int whence); 


Refer to the manpages for specific information about these interfaces. 

6.1.1.4 New 64-bit Specific Interfaces and Data types 

All of the data types and structures listed in “Section 6.1.1 Affected data types, structures, and API’s” have 
64-bit counterparts. Similarly, all of the affected POSIX and Non-POSIX API have a new counterpart in the 
form of interfaceSAQ. An example of this is open() and open64(). These interfaces are provided for devel¬ 
opers who do not want to use the 64-bit compile environment discussed in “Section 6.1.2 Compile Environ¬ 
ments”. It is not recommended to use these interfaces as it will limit the portability of your application in the 
future. Refer to the manpages for specifics about these new interfaces. These interfaces are provided by 
the _LARGEFiLE64_souRCE compile option. 

New 64-bit Specific Data Ty pes 

blkcnt 64_t 
fsblkcnt 64_t 
fsfilcnt 64_t 
fpos 64_t 
off 64_t 
rlim64_t 

New 64-bit Specific Data Structures 

flock64 
stat 64 
statvfs 64 
async_request 64 


Table 3: New 64-bit Specific Interfaces 


creat64() 

fgetpos64() 

fopen64() 

ffeopen64() 

fseeko64() 

fsetpos64() 

fstat64() 

fstatvfs64() 

fstatvfsdev64() 

ftello64() 
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Table 3: New 64-bit Specific Interfaces 


ftmncate64() 

ftw64() 

lockf64() 

lseek64() 

lstat64() 

mmap64() 

nftw64() 

open64() 

prealloc64() 

stat64() 

statvfs64() 

statvfsdev64() 

tmpfile64() 

truncate64() 

getrlimit64() 

setrlimit64() 






6.1.2 Compile Environments 

The implementation of large-files provides two compile environments for developers. These compile envi¬ 
ronments signify the size of data types and data structures that are affected by large-files. It is important to 
realize that large-files are supported in both environments, however the interfaces and data types used in 
each environment are different. Please note that the omission of _file_offset_bits is equivalent to 
_FiLE_OFFSET_BiTS=32 for this release. The default of _file_offset_bits=32 may be changed in 
future releases of HP-UX. 


6.1.2.1 64-bit Compile Environment 

The 64-bit compile environment supplies 64-bit versions of the appropriate data types, data structures, and 
interface calls for a 32-bit application. For instance, in this environment an of f_t is a 64-bit data type and 
an iseek is a 64-bit iseek. It is recommended that large-files applications be built in this environment. To 
compile a source file in this environment, the following compile flag must be used: 

-D_FILE_OFFSET_BITS=64 

An example compilation line for foo.c would look as follows: 

CC -Ae foo.c -D_FILE_OFFSET_BITS=64 -o foo 

Beginning with HP-UX 11.0, if compiling a 64-bit application for running on a 64-bit OS, using the 64-bit 
compile flag, +ddlp64, will set _file_offset_bits to 64 by default. 

6.1.2.2 32-bit Compile Environment 

The 32-bit compile environment is the default environment, and is the same environment that has been 
used in previous releases of HP-UX. Using the same example as above, an of f_t is a 32-bit data type 
and an iseek is the standard 32-bit iseek. Although this compile environment is the default, it can be 
explicitly requested in the same way as the 64-bit compile environment. The following compile flag may be 
used: 


-D_FILE_OFFSET_BITS=32 

The following examples show how to compile foo.c in the 32-bit compile environment. The first example 
relies on the default behavior, the second explicitly requests the 32-bit compile environment. 

CC -Aa foo.c -o foo 

CC -Aa foo.c -D_FILE_OFFSET_BITS=32 -o foo 
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6.1.3 Calls and Environments 

Note that a 64-bit call is not just a call from the 64-bit environment. Either environment can produce 64-bit 
calls. In the 32-bit environment, 64-bit calls are produced by the new API. In the 64-bit environment, the 
POSIX API produces 64-bit calls. The table summarizes this information: 


Table 4: Calls produces by an API in a Compile Environment 


API 

32-bit compile 
environment 

64-bit compile 
environment 

POSIX API 

32-bit calls 

64-bit calls 

New non-POSIX API 

32-bit calls 

64-bit calls 

New 64-bit specific Interfaces 

64-bit calls 

64-bit calls 


Another way of looking at this information is; what combination of compile options produce which API, and 
are they using 32-bit or 64-bit calls? This information is summarized in the table below. Please note that 
the omission of _file_offset_bits is equivalent to _file_offset_bits=32. 


Table 5: API made available by compile Flags. 


Compile Flags 

API Available 

Type of Call 

_LARGEFILE_SOURCE 

New non-POSIX API and 

POSIX API 

32-bit calls 

_LARGEEILE_SOURCE and 
_EILE_OEESET_BITS=64 

New non-POSIX API and 

POSIX API 

64-bit calls 

_EILE_OEESET_BITS=64 

POSIX API 

64-bit calls 

_LARGEEILE64_SOURCE 

New API (*64 calls) and 

New non-POSIX API 

New API is 64-bit, other calls 
are 32-bit. the New non-POSIX 
API is 32-bit. 


6.1.4 Determining if large-files are supported 

For developers that want to write code that is portable between systems that support large files and sys¬ 
tems that do not support large files, two announcement macros are available. These announcement mac¬ 
ros are _lfs_largefile and _lfs64_largefile. 

_LFS_LARGEFiLE will be Set to 1 if the system provides the interfaces specified by the 
_LARGEFiLE_souRCE compile option. The following example shows how to query this macro. 

#if _LFS_LARGEFILE == 1 

/** _LARGEFILE_SOURCE is available **/ 

#else 

/** _LARGEFILE_SOURCE not available **/ 

_LFS64_LARGEFiLE will be Set to 1 if the system provides the interfaces specified by the 
_LARGEFiLE64_souRCE compile option. The following example shows how to query this macro. 
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#if _LFS64_LARGEFILE == 1 

/** _LARGEFILE64_SOURCE is available **/ 

#else 

/** _LARGEFILE64_SOURCE is not available **/ 

Please note that these macros only Identify If the Interfaces are available If the source Is compiled with the 
appropriate compile option. The developer must still compile the source with -d_largefile_source or 

-D_LARGEFILE 6 4_SOURCE. 


6.1.5 Importance of Header Files 

One last point that can not be over emphasized Is the Importance of header files to this Implementation of 
large files. All of the mapping of data types, data structures, and Interfaces are done In header files. A sim¬ 
plified example of how this mapping Is done follows: 

#if _FILE_OFFSET_BITS == 64 

static off_t lseek(a,b,c) off_t b; { return _lseek64(a,b,c); } 

tendif 

Where you may not have Included the required header file to use an Interface In the past and gotten away 
with It, failure to do so with large files will cause Incorrect behavior of your application. Unfortunately, the 
consequences of not Including the appropriate header files will not be seen until run time, and In some 
cases can be destructive. An example of this will be shown In “Section 6.3 Conversion Issues/Examples”. 

If the 64-bit compile environment is being used with calls to any of 
the interfaces listed in Table 2: POSIX API, the appropriate header 
file must be included in the source file. To determine which header 
files are necessary, consult the man page for each interface listed in 
Table 2: POSIX API that is being used in the source file. 

6.2 Conversion Choices 

There are basically 5 options for a developer considering large file support: 

1. Do nothing. This Is a reasonable option If the program does not need to read or write large files, 
and If an occasional open or stat failure will not cause confusing behavior. 

2. Make the program aware of large files, without actually supporting them. This would Involve look¬ 
ing for the new open and stat errors, and behaving In some sensible way when they are encoun¬ 
tered. For example. If your application Is a browser which currently exits when stat falls, then It 
might be changed to display something reasonable and continue Instead. 

3. Use the new API In the 32-blt compile environment. 

4. Use the POSIX API and the New non-POSIX API In the 64-blt compile environment. If you do not 
need a 64-blt version of ftell () and f seek (), then you do not need the New non-POSIX API. 

5. If the application Is only for use on a 64-blt OS, compile the application as a 64-blt application. 

Options 3, 4 and 5 allow the application to fully support large files. If the application must support large 
files, one of these three options must be chosen. 
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6.2.1 Using the New API or POSIX API? 

How do you decide between the new API and the POSIX API? 

The advantage of the new API is that the type cieanup required can be confined to the routines that handie 
the fiies that need to be iarge. The disadvantage is that source code gets messier, and iess portabie. 

The advantage of the POSIX API is portabiiity. The disadvantage is that you can run into probiems mixing 
objects from the 32-bit and the 64-bit compiie environments. There are no probiems with iinking to iibc, 
because of the header fiie code that we described above. The iinker wiii not prevent mixing of objects in the 
same executabie, since the object is not in any way marked as being from one environment or the other. 

But suppose one of the source fiies contains a routine that passes an of f_t, or a struct stat, or any¬ 
thing eise whose size is affected by the compiie environment, to a routine in another fiie compiied in the 
other environment. This code won’t work, and it can be very difficuit figuring out why not, since the source 
code iooks fine. 

The probiem can aiso exist between your objects and a iibrary that you use, possibiy suppiied by some 
other company. If the library interface contains an of f_t, or any other data type whose size is changing, 
then using the POSIX API with the 64-bit environment probably won’t work. The library was almost cer¬ 
tainly compiled in the 32-bit environment, which means that it expects the smaller data types. It might be 
just as capable as Iibc, with the appropriate code in header files to make either environment work, but this 
is unlikely, and it should not be assumed unless the library supplier says that it works. 

Because of these mixing issues, we advise using the POSIX API only when every object file can be com¬ 
piled in the 64-bit environment. You can mix objects if you’re sure that no data is passed around (or refer¬ 
enced as a global) whose type is different in the two environments, but we recommend against it. 

6.3 Conversion Issues/Examples 

6.3.1 General Conversion Issues 

In many programs, assumptions are made that int’s, long’s, pointers, and offsets are all the same size. This 
leads to programming practices that are not conducive to converting to large files, and will not be condu¬ 
cive to LP64 data models on 64-bit Operating Systems. Every program that makes the above assumptions 
will need to be modified to support large files. Typical problems include: 

• offsets are assigned to an int or a long. 

• offsets are passed to routines whose parameters are undeclared, or declared as the wrong type. 

• constants are passed as offsets without casting in non-ANSI mode. For example: 

Ka= Existing code fragment 

lseek(fd, 0, SEEK_SET); 

Ka= If we are using the 64-bit compile environment would need to be as fol¬ 
lows: 

lseek(fd, (off_t)0, SEEK_SET); 

Ka= If we are using the new 64-bit API 

lseek64(fd, (off64_t)0, SEEK_SET); 

• precision is lost in shifting. An example: 

Existing Code fragment assumes of f_t same size as int. 
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off_t offset; 
int blkno; 

offset = blkno << BLOCK_SHIFT; 

Ka= In the 64-bit compile environment, the codes should be as follows: 

off_t offset; 
int blkno; 

offset = (off_t)blkno << BLOCK_SHIFT; 


NOTE: The typecasting of blkno forces the promotion of blkno to a 64-bit value before the shift 

takes place. With out the typecasting, the shift would have occurred on blkno as an int 
(32-bits), and then the value would have been promoted to 64-bit. 


• types are embedded in generic libc calls. For example: 

KS" Existing printf call. 

printf ("%d\n",offset); 

The above code fragment works for a 32-bit offset, but it does not work for 
a 64-bit offset. In the 64-bit compile environment, offset would require 
%lld. For portability, use #define for these calls that depend on the value 
of _FiLE_OFFSET_BiTS. The above code fragment also assumes that 
an int and a long are the same size, this may not be a safe assumption in 
the future. 


/** Define in header file or top of source **/ 
tifdef _FILE_OFFSET_BITS ==64 
#define offset_format "%lld" 

#else 

tdefine offset_format "%ld" 
tendif 

/** Somewhere in program **/ 
printf(offset_format "\n", offset); 

K3" There is also a macro available in <portal.h> that will do this type of map¬ 
ping. The following code example shows how to use this macro. 

tinclude <stdio.h> 
tinclude <portal.h> 
main () 

{ 

off_t a; 
a = 50; 

printf ("a is %" PRIdF64 "\n",a); 

} 

Ks= A similar problem to the above example is the use of strtol() and strtoll(). 
Calls to these interfaces should be replaced with calls to strtoimax() and 
strtoumax(). Calls to these interfaces will determine the correct translation 
for the current environment. 
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6.3.2 fseekQ and ftellQ conversion Issues 

For programs that use fseek and/or ftell a decision must be made as to how to migrate this code to iarge 
fiies. This source code, even if strict typing had been used, can not be converted by just using the 
_FiLE_OFFSET_BiTS=64 compiie option. One option is to change the fteii () and fseek () caiis to 
fteiio () and f seeko () respectiveiy. The deveioper wouid aiso have to ensure that the appropriate vari- 
abies are changed from long int’s to of f_t’s. The foiiowing two triviai code fragments show what 
changes wouid be necessary. 

long int offset; 
int rslt; 

offset = ftell(fd); 
if (offset != -1) 

rslt = fseek(fd,offset,SEEK_CUR); 


The above code wouid need to be changed as foiiows: 

off_t offset; 
int rslt; 

offset = ftello(fd); 
if(offset != -1) 

rslt = fseeko(fd,offset,SEEK_CUR); 


Another option is to use the avaiiabie macros to determine what types and interfaces are going to be 
used.The foiiowing code demonstrates a way to address this situation and maintain portabiiity. 
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movepos(fd) 

FILE *fd; 

{ 

int rslt; 

#if defined(_LFS_LARGEFILE) == 1 S& defined(_LARGEFILE_SOURCE) 

#define ftell ftello 

#define fseek fseeko 

#ifdef _FILE_OFFSET_BITS == 64 

#define offset_format "%lld" 

#else /** _FILE_OFFSET_BITS == 32 **/ 

#define offset_format "%d" 

#endif 

off_t offset; 
off_t num; 

#else 

#define offset_format "%d" 
iong int offset; 
iong int num; 

#endif 


offset = ftell(fd); 
if(offset != -1) 

{ 

num = offset/2; 

rslt = fseek(fd,num,SEEK_CUR); 
if(!rslt) 

printf("New Position "offset_format"\n",offset-num); 
else 

printf("Error!\n"); 


The code fragment above will work on a system that does not support large files and a system that does 
support large files. It will also work in both the 32-bit compile environment and the 64-bit compile environ¬ 
ment. 

Note that ftell () and fseek () have been mapped to fteiio () and fseeko {) if the system supports 
large files and the new non-POSIX API have been enabled. Also note that the print f () trick described in 
the “General Conversion Issues” section is being used in this code. 

Note: This is not an issue for 64-bit applications for a 64-bit OS. 


6.3.3 Example 1 - Simple conversion 

This simple example demonstrates the types of changes that are necessary to existing code. This example 
is not applicable to 64-bit applications for a 64-bit OS. 

6.3.3.1 Original Program 

tinclude <sys/types.h> 
tinclude <sys/stat.h> 
tinclude <unistd.h> 
tinclude <fcntl.h> 

print_byte(name, offset) 
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char *name; 
int offset; 

{ 

int fd; 
int ret; 

if((fd = open(name,0_RD0NLY)) == -1) { 

printf("Open FailedXn"); 
return(-2) ; 

} 


if((ret=lseek(fd, offset, SEEK_SET)) == -1) { 

printf("Iseek failedXn"); 
return(-3) ; 

} 


/** Print ret **/ 

printf("Iseek returned %d\n",ret); 


save(name) 
char *name; 

{ 

struct stat statb; 
int x; 

if(stat(name,Sstatb) == -1) { 

printf("Stat FailedXn"); 
return(-1); 

} 


X = statb.st_size/2; 
print_byte(name,x); 


main () 

{ 

char *name="Fred"; 
save(name); 

} 


This program will work fine as long as the file is small. However, if the file is large, the stat () call will fail 
and the program will return a -1. 

6.3.3.2 Converted Program using new API 

tinclude <sys/types.h> 
tinclude <sys/stat.h> 
tinclude <unistd.h> 
tinclude <fcntl.h> 

print_byte(name, offset) 
char *name; 

off64_t offset; 

{ 

int fd; 

off64_t ret; 
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if((fd = open64 (name,0_RD0NLY)) == -1) { 

printf("Open FailedXn"); 
return(-2) ; 

} 


if((ret=lseek64(fd, offset, SEEK_SET)) == -1) { 

printf("Iseek failedXn") ; 
return (-3); 

} 


/** Print ret **/ 

printf ("Iseek returned %llci\n", ret) ; 


save(name) 
char *name; 

{ 

struct stat64 statb; 
off64_t x; 

if (stat64 (name,Sstatb) == -1) { 

printf("Stat FailedXn"); 
return(-1) ; 

} 


X = statb.st_size/2; 
print_byte(name,x); 


main () 

{ 

char *name="Fred"; 
save(name); 

} 


In this example the stat (), open () and iseek () calls have been changed to the corresponding 64-bit 
calls. The statb structure has been changed to a struct stat64, which is like a struct stat, except 
that some fields are larger. One field that’s larger is st_size, which is now an of f 64_t. Since st_size 
is larger, it must be assigned to a variable of type off 64_t; hence the change in the declarations of x and 
offset. This program will now work on large files as long as it is compiled with - 
D_LARGEFILE64_SOURCE. 

6.3.3.3 Converted Code using POSIX API (_FILE_OFFSET_BITS=64) 

Now let’s look at the converted version using the POSIX API in the 64-bit compile environment. You might 
think that there is nothing to change, since the existing code already uses the POSIX API, but this isn’t 
true. Even though the program works when compiled in the 32-bit environment, it doesn’t work when com 
piled in the 64-bit environment. That’s because the program implicitly assumes that st_size can be 
assigned to an int (x), and that x/2 can be passed to a routine that takes an int as parameter. These 
assumptions used to be correct, though unwise. Now they’re just wrong. 


tinclude <sys/types.h> 
tinclude <sys/stat.h> 
tinclude <unistd.h> 
tinclude <fcntl.h> 
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print_byte(name,offset) 
char *name; 

off_t offset; 

{ 

int fd; 

off_t ret; 

if((fd = open(name,0_RD0NLY)) == -1) { 

printf("Open FailedXn"); 
return(-2) ; 

} 


if((ret=lseek(fd, offset, SEEK_SET)) == -1) { 

printf("Iseek failedXn"); 
return(-3) ; 

} 


/** Print ret **/ 

printf("Iseek returned %lld\n",ret); 


save(name) 
char *name; 

{ 

struct stat statb; 

off_t x; 


if (stat(name,Sstatb) == -1) { 

printf("Stat FailedXn"); 
return(-1) ; 

} 


X = statb.st_size/2; 
print_byte(name,x); 


main () 

{ 

char *name="Fred"; 
save(name); 

} 


This version shouid iook a iot more famiiiar. in fact, this version couid be complied and run in either the 32- 
bit or the 64-bit compile environment. The source has no assumptions about whether of f_t is larger or 
smaller than an int or anything else. 

6.3.4 Example 2 - Dealing with Large Files 

The following example is a trivial program that will take an existing file, extend it out to 200 bytes, and write 
“jkimno” at the end of the file. If the file does not exist it creates the file. 

6.3.4.1 Original Code 
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tinclude 

tinclude 

tinclude 

tinclude 

tinclude 

tinclude 


<fcntl.h> 
<unistd.h> 
<errno.h> 
<stdio.h> 
<stdlib.h> 
<sys/stat.h> 


main (argc,argv) 
int argc; 
char **argv; 

{ 

int fd; 
off_t rslt; 
ssize_t wrote; 
struct stat statb; 

if (argc < 2) { 

printf ("Usage: %s filenameXn",argv[0]); 
exit (1) ; 

} 


/** Check to see if file exists **/ 
if(stat(argv[1],&statb) == -1) { 

printf("Stat FailedXn"); 
printf("Creating FileXn"); 

/** Create file **/ 

if((fd = ereat(argv[1],(mode_t)0777)) == -1) { 

perror("creat fails"); 
exit(1) ; 


else 

if((fd = open(argv[1],0_RDWR)) == -1) { 

perror("open fails"); 
exit (1); 

} 


if ( (rslt=lseek(fd, 200,SEEK_SET)) ==-l) { 

perror("Iseek fails"); 
exit (1) ; 

} 


wrote = write(fd,"jklmno", 5) ; 
close(fd); 


This code works correctly in the 32-bit compile environment and may even appear to be clean for the 64-bit 
compile environment. However, this example will overwrite the beginning of the file in the 64-bit compile 
environment if it is compiled in K&R mode. This problem will not arise if the source is compiled in ANSI 
mode. The existing file “Fred” looked as follows before either program was run. 

File Before Either Program is Run 
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This is the beginning of the file. These characters should not 
be modified by the second example. 


Compile line for 32-bit and results of program execution 

cc example2.c ~o example2 

File after execution 

This is the beginning of the file. These characters should not 

be modified by the second example. 

jklmn 

The result of the 32-bit executable is what we would expect. The characters ’’jklmn” have been added to the 
end of the file. 

Compile line for 64-bit and results of program execution 

cc example2.c -D_FILE_OFFSET_BITS=64 -o example21f 

File after execution 

jklmnis the beginning of the file. These characters should not 
be modified by the second example. 


The result of the 64-bit version may be unexpected as it overwrote the data at the front of the file instead of 
seeking to the correct position and then writing out the data. What is also distressing is that error checking 
did not catch this problem. It is very important that constants be typecast as the correct type when being 
sent as parameters to another function if the source is not being compiled in ANSI mode. The following 
example will show how to fix this problem. 

6.3.4.2 Corrected Source 


tinclude <fcntl.h> 
tinclude <unistd.h> 
tinclude <errno.h> 
tinclude <stdio.h> 
tinclude <stdlib.h> 
tinclude <sys/stat.h> 


main(argc,argv) 
int argc; 
char **argv; 

{ 

int fd; 
off_t rslt; 
ssize_t wrote; 
struct stat statb; 

if (argc < 2) { 

printf("Usage: %s filenameXn",argv[0]); 
exit (1) ; 

} 
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/** Check to see if file exists **/ 
if(stat(argv[1],&statb) == -1) { 

printf("Stat FailedXn"); 

printf("Creating FileXn"); 

/** Create file **/ 

if((fd = Great (argv[1], (mode_t )0777)) == -1) { 

perror("creat fails"); 
exit(1) ; 

} 

} 

else 

if((fd = open(argv[1],0_RDWR)) == -1) { 

perror("open fails"); 
exit (1); 

} 


if((rslt=lseek(fd,(off_t)200,SEEK_SET)) == -1) { 

perror("Iseek fails"); 
exit (1) ; 

} 


wrote = write(fd,"jklmno", 5) ; 
close(fd); 


This code now works correctly for both the 32-bit compile environment and the 64-bit compile environment. 
Notice that the change was simply typecasting the second parameter to iseek () as an of f_t. The line 
of code that was modified is in bold. 

6.3.5 Mixing 32-bit and 64-bit caiis 

We have already talked about mixing objects from the two compile environments, and advised against it. 
But two other types of mixing are also possible. 

First, the old and new API may be mixed in the same program, and even in the same source file. There is 
still the chance of an error here (you could pass an of f_t to a routine expecting an of f 64_t), but at least 
these problems can be diagnosed by looking at the source. 

Second, 32-bit calls and 64-bit calls can be mixed on the same file descriptor. Whether this is a good idea 
depends on how you do it, and so we look at this issue in some detail here. 

You might think that mixing calls on the same file descriptor would have to be done by using both the old 
and the new API, since we don’t recommend linking objects from the two environments. But note that file 
descriptors can be inherited across fork and exec, and can also be passed to other processes via spe¬ 
cial syscalls. That is, a file descriptor open with a 32-bit call might be passed to or inherited by a program 
that uses 64-bit calls, or vice versa. 

One feature of this cross-process mixing is that the process opening the file doesn’t always know much 
about the process on the receiving end. If your application is a shell, and you are asked to redirect output 
for program X, you usually have no idea if program X is converted to support large files or not. Should you 
use a 32-bit open or a 64-bit open? The answer isn’t obvious. 
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The problems occur when the 64-bit part of the program opens a file and passes the descriptor to the 32- 
bit part of the program (where the parts may be in the same process, or in different ones). Any other type 
of mixing should be safe. This means, for example, that a developer should be able to safely use 64-bit 
calls inside libraries, as long as the library interface doesn’t change, and as long as the library does not do 
a 64-bit open and pass the resulting file descriptor back to the calling code. The modified library would 
work correctly linked to code containing 32-bit calls, or code containing 64-bit calls. 

In the more dangerous situation, the 32-bit part of the program (again, in the same process, or in a differ¬ 
ent process) gets a file descriptor which has been opened with the 64-bit open. The file may be large, in 
which case the program may not iseek correctly, possibly because of overflow when calculating the seek 
offset. If the file is small, the 32-bit part can grow the file to be large. 

6.3.5.1 Avoiding Mixing Problems 

There is a general strategy for avoiding these mixing problems. Any code that opens a file with the 64-bit 
open should ensure that the file descriptor is not passed to code that cannot support large files. For exam¬ 
ple, a library routine that opens a file for use by the caller (like f open) should rely on some indication from 
the caller that a 64-bit open is required; otherwise, it should do a 32-bit open. And shells should use a 32- 
bit open rather than a 64-bit open for maximum safety. 

Unfortunately, maximum safety has a price. If a shell uses a 32-bit open, then it will fail on cases like this: 

$ app <large_file 

The solution might be to use a 64-bit open after all, which is what our standard shells will do. The argu¬ 
ment here is that the application is unlikely to iseek on its standard input or output, since these could also 
be pipes. The argument is not airtight, since the application could check the type of its input and output 
files, seeking only if they aren’t pipes. But it does seem to be true that programs that don’t iseek are safe, 
even if they use 32-bit calls on a file descriptor opened with a 64-bit open. 

6.3.6 Converting Libraries 

If your application is a library, then you shouldn’t even think about converting the library to support large 
files in a way that changes the sizes of parameters for existing library entry points. The problem is that peo¬ 
ple will link your new library with an old application, and the results will be disastrous. Instead, you must 
add new entry points, as we did for ilbc. This way, existing applications maintain their current behavior. 
Users of the new entry points must use the new names for them, unless your language allows you to play 
some naming trick similar to the one we are using for llbc. 

6.4 Common Pitfalls and Debugging Techniques 

This section discusses a few key areas that can cause a developer problems, how to recognize the prob¬ 
lem, and debugging techniques to resolve these problems. 

6.4.1 Not Including Appropriate Header Fiies 

It is common that programs are developed that do not include the appropriate header file for a given inter¬ 
face and these programs execute correctly. Programs such as these will not work in the 64-bit compile 
environment since the implementation of the 64-bit compile environment is done in header files. The fol¬ 
lowing example demonstrates this point. 
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To determine which header fiies need to be inciuded consuit the 
man page for aii interfaces iisted in Tabie 2: POSiX APi. 

File: sparse_broken.c 

tinclude <fcntl.h> 
tinclude <errno.h> 
tinclude <stdio.h> 
tinclude <stdlib.h> 

tdefine ONEGB 1073741824L 

main(argc,argv) 
int argc; 
char **argv; 

{ 

int fd; 

off_t offset,rslt; 
ssize_t wrote; 
long val; 

if(argc < 3) { 

printf("Usage : %s num_gbs filenameXn",argv [ 0]) ; 
exit (1) ; 

} 


offset = (off_t)ONEGB; 
offset = offset * val; 
printf("offset is %lld\n",offset); 


/** Create file **/ 

if((fd = creat(argv[2],(mode_t)0777)) == -1) { 

perror("creat fails"); 
exit (1) ; 


if((rslt=lseek(fd,offset,SEEK_SET)) == -1) { 

perror("Iseek fails"); 
exit (1) ; 

} 


wrote = write(fd,"jklmno", 5) ; 
close(fd); 


This program will compile without any errors in both environments. However, if this source is compiled with 
_FiLE_OFFSET_BiTS=64, the resulting application will not behave correctly. In some cases the applica¬ 
tion will give a run time error of “Invalid argument”, and in other cases the application will just do the wrong 
thing. Examples of this behavior are shown below: 

Compilation line of above source: 

cc sparse_broken.c -D_FILE_OFFSET_BITS=64 -o sparse_broken 


Test Runs and resulting Files: 

sparse_broken 1 Ifilel 
offset is 1073741824 
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Iseek fails: Invalid argument 
11 Ifilel 

0 Feb 5 17:59 Ifilel 


sparse_broken 2 lfile2 
offset is 2147483648 
Iseek fails: Invalid argument 
11 lfile2 

0 Feb 5 17:59 lfile2 


sparse_broken 4 lfile4 
offset is 4294967296 
11 lfile4 

2147377157 Feb 5 17:59 lfile4 

The size given for ifiie4 is incorrect, it shouid be 4294967301. 

The probiem with this source fiie is that it does not inciude <unistd. h>, which is required for to use the 
Iseek 0 interface. Since iseek o is mapped to the 64-bit equivaient in <unistd.h>, faiiure to inciude 
it means that the 32-bit iseek () is being caiied. These types of errors can be very troubiesome to debug 
since different run time resuits occur depending on the input vaiues. 

Method to Debug this case 

When an appiication is exhibiting confusing behavior such as this exampie, the best way to check for 
header fiie probiems is to use the cc preprocessor. By preprocessing the source fiie, aii of the data and 
interface mapping that is done for iarge fiies wiii be resoived and you can check the resuiting output to 
ensure that everything is getting mapped as you wouid expect. The cc preprocessor output can be very 
verbose, so it is best that you send the output to a fiie and then examine it’s contents. The foiiowing exam- 
pie wiii demonstrate this technique on sparse_broken. c. 

% cc -E sparse_broken.c -D_FILE_OFFSET_BITS=64 > sparse_broken.cpped 
% grep Iseek sparse_broken.cpped 

If((rslt=lseek(fd,offset,0)) == -1) { 

perror("Iseek falls"); 


As you can see, there is no mapping of iseek () to_ iseek64 (). The foiiowing exampie wiii show what 

the cc preprocessor output iooks iike when the <unistd.h> header fiie is added to this source (sparse.c). 

% cc -E sparse.c -D_FILE_OFFSET_BITS=64 > sparse.cpped 
% grep Iseek sparse.cpped 

extern off_t Iseek (); 
extern off_t _lseek64(); 

static off_t lseek(a,b,c) off_t b; { return _lseek64(a,b,c); } 

If((rslt=lseek(fd,offset,0)) == -1) { 

perror("Iseek falls"); 


The important thing to notice in the above code fragment is that iseek () is mapped to_ iseek64 () 

through an in-iine function in the header fiie. This methodoiogy is used in by the 64-bit compiie environ¬ 
ment to map aii of the iargefiies interfaces to their 64-bit equivaients. 
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6.4.2 Not compiling with the correct options 

The following example will demonstrate a run time error that will occur if you do not compile with the correct 
compile options. This example was also chosen because it will demonstrate the error message associated 
with the new EOVERFLOW error. This program is a very simple program that attempts to open the file sent 
in as a command line argument. If the file can not be opened, perror () is called. 

Source: examples.c 

tinclude <fcntl.h> 
tinclude <stdio.h> 

#include <errno.h> 


main (argc,argv) 
int argc; 
char *argv[]; 

{ 

int fd; 

if(argc < 2) { 

printf("Usage: %s filenameXn",argv[0]); 
exit (-1); 

} 


if((fd = open(argv[1],0_RDWR)) == -1) { 

perror("Open Failed"); 
exit(-1); 

} 


if((fd = close(fd)) == -1) { 

perror("Close Failed"); 
exit(-1); 



Compilation Line: 

cc examples.c -o example 

Invocation of example (note: largefile is a 2gb+ file) and results: 

% examples largefile 

Open Failed: Value too large to be stored in data type 


This is an important error message to be aware of, as it is a general purpose error message. Developers of 
large-files applications should have an alarm go off if they receive this error. While this error is used as an 
example of not compiling the source with the -d_file_offset_bits=64 compile option, it will also 
occur if the appropriate header files are not included. In this case, failure to include <f cnti. h> and using 
the value of o_rdwr would result in this error. 

The correct compilation line is: 

cc examples.c -D_FILE_OFFSET_BITS=64 -o examples 

Method to Debug this case 

Once again the cc preprocessor is the best means to begin debugging this situation. If perror () prints 
out “Value too large to be stored In data type”, the first step is ensure that the application is not execut- 
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ing the 32-bit interfaces when it shouid be executing the 64-bit interfaces (this exampie assumes that you 
want to correctiy handie iarge fiies). The same debug steps that appiied to exampie “Section 6.4.2 Not 
compiling with the correct options” appiy here. 
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7. Appendix 1 - HP Specific Implementation Details 

This note describes HP-specific aspects of HP’s iarge fiies impiementation. it is intended for readers famii- 
iar with the specification entitied Adding Support for Arbitrary Fite Sizes to the Singie Unix Specification, 
from the group caiiing itseif the Large Fiie Summit. That document, and other information about the Sum¬ 
mit, is avaiiabie via http://www.sas.com/standards/iarge.fiies/. The comments in this note, and the section 
number references, are based on the version of March 20, 1996. That is the version submitted by the Sum¬ 
mit to X/Open, for possibie inciusion in the next version of the XPG. 

We describe piaces where HP differs from the Summit spec, and aiso suppiy detaiis of HP’s impiementa¬ 
tion choices where these are ieft open by the spec, in generai, HP supports the entire Summit spec. 

Both this note and the Summit spec describe the impiementation of iarge fiies for the C ianguage oniy. 
Other ianguages are beyond the scope of this document. 

data type sizes 

HP offers two compiie environments for the C ianguage. The first, avaiiabie without any speciai fiags, has 
the usuai 32-bit fiie-reiated data types, in the second environment, the foiiowing types are 64 bits: 

of f_t 

fpos_t 

rlim_t 

blkcnt_t 

fsblkcnt_t 

fsfilcnt_t 

(Note that ino_t is 32 bits in both environments.) The second environment is seiected by setting 

_FILE_OFFSET_BITS tO 64 (A.3.2.4) 

HP aiso offers the Transitionai Extensions (3). Aii new data types described there are 64 bits on HP 
machines, with the exception of dirent 64 and ino64_t, which are not provided (3.1.2.3). 

mixing object files 

HP aiiows iinking together of object fiies from the two compiie environments (3.3.4). We ensure that refer¬ 
ences to iibc are resoived properiy; for exampie, “iseek” wiii automaticaiiy refer to the 32-bit iseek in the 
32-bit compiie environment, and to the 64-bit iseek in the 64-bit compiie environment. Deveiopers must 
ensure that references that cross object fiie boundaries do not break if these fiies are complied in different 
environments. 

offset maximum 


Since off_t is 32 bits in the default compile environment, the offset maximum there (1.4, 2.1) is 2 GB -1. 
For transitional extensions (3), and in the 64-bit compile environment, the offset maximum is 2**63 - 1 
bytes. 

file system ty pes 

Both HFS (UFS) and JFS support large files. HP’s initial large files release, HP-UX 10.20, will not include 
the V3 protocol for NFS; HP-UX 11.0 does include large file support over NFS using the V3 protocol. 
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qetrlimit and setrlimit 

HP-UX maintains resource iimits as though they were 32-bit quantities internaiiy. Therefore, resource iimits 
are aiways representabie in an object of type riim_t (2.2.1.19), whether riim_t is 32 bits or 64. 

RLIM_SAVED_MAX and RLIM_SAVED_CUR are never returned from getrlimit or getrlimit64. For 
setrlimit and setriimit64, any iimit too iarge for a 32-bit riim_t is interpreted as rlim_infinity. 

mmap 

Currentiy, attempts to map past the first 2 GB -1 bytes with mmap in the 64-bit compiie environment, or with 
mmap64, faii with EOVERFLOW. That is, HP does not support mmap on iarge fiies. This is expected to 
change in a future reiease. 

asynchronous I/O 

The initiai iarge fiies reiease, HP-UX 10.20, does not inciude support for POSIX Asynchronous I/O at aii, 
on smaii fiies or iarge ones. POSIX Async support is avaiiabie in HP-UX 11.0, and supports iarge fiies. 

HP has made avaiiabie a proprietary async I/O interface for certain ISVs. This interface now supports iarge 
fiies. 

Iseek 


HP returns einval for the error case that the Summit designates as eoverflow (2.2.1.22), both for his- 
toricai reasons, and because XPG4 seems to require einval. 

qetconf 

HP does not support the new getconf macros described in 3.3.4. 

mount options 

HP supports the mount options mentioned in 3.2.1. Every fiie system is either a nolargef lies system 
(currentiy the defauit), or a largef lies fiie system. This state is set at mkf s time, and can be changed 
with f sadm, though f sadm cannot convert from largefiles to nolargef lies if any iarge fiies are 
present. A mount with neither option accepts either type of fiie system. A mount invocation that specifies 
-o largefiles or -o nolargef lies faiis if the fiie system does not have the type requested. Oniy 
HFS (UFS) and JFS support these options, since oniy these fiie systems support iarge fiies. 

A noiargefiles fiie system has no iarge fiies on it. Aiso, none can be created; the system behaves as 
though the offset maximum for every open is 2 GB -1, regardiess of the size of off_t or off64_t. 

shell redirection 

Sheii redirection is done with iarge opens (2.3.2). 
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8. Appendix 2 - Terms/Definitions 

Large file 

A file which is 2GB’s or greater. 

Small file 

A file which is less than 2 GB. 


API 

Application Programming Interface. 

ABI 

Application Binary Interface. 

Old API 

The POSIX API containing routines such as creat, open, etc. 

New API 

A 64-bit API containing routines such as creat64, open64, etc. 

32-bit compile environment 

Compile environment with 32-bit data types (e.g., 32-bit off_t, rlim_t). 

64-bit compile environment 

Compile environment for a 32-bit application with 64-bit data types (e.g., 64-bit off_t, rlim_t). This com¬ 
pile environment is made available by using the _FILE_OFFSET_BITS compile option. This compile 
option is used as follows: 

cc -D _FILE_OFFSET_BITS=64 

In this document the “64-bit compile environment” is one where file offset and file size related data 
types become 64-bit. 

Old ABI 

ABI that is created by 32-bit compile environment. 

New ABI 

ABI that is created by 64-bit compile environment or 64-bit new API. 

32-bit calls 

Calls from the old API in the 32-bit compile environment. 

64-bit calls 

Calls either from the new API or from the 64-bit compile environment. 

Large open 

An open operation by new API (open64), old API (open) with the 0_LARGEFILE flag set, or old API 
with 64-bit compile environment. 

Small open 

An open operation by old API (open) with the 0_LARGEFILE flag cleared. 
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_LARGEFILE_SOURCE 

This compile option provides two new interfaces, ftelloQ and fseeko(), to the developer. ftello() and 
fseekoO will behave just as ftell() and fseek() except, ftello() will return an of f_t instead of a long 
int and fseekoO will take an of f_t as it’s second parameter instead of a long int. 

_LARGEFILE64_SOURCE 

A compile option in the 32-bit compile environment, this compile option must be specified to use the 
new API. This compile option will make all of the new 64-bit specific interfaces and data types available 
to the developer. These interfaces will be in the form of interface64{). The 32-bit versions of ftello() and 
fseekoO are also provided by the _LARGEFILE64_SOURCE compile flag. 

_FILE_OFFSET_BITS 

This compile option will specify what type of compile environment the developer will use. If the compile 
option is set to 64, then various data types, structures, and interfaces will be changed to their 64-bit 
equivalents. If the compile option is set to 32, then these items will be set to 32-bits (32 bits is the 
default).This compile option is used as _file_offset_bits=32 or _file_offset_bits=64. 

MAX_SMALL_FILE 

This is a file size constant defined as Ox/fffffff (2 GB -1). 
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